Grounding Visual Representations with Texts for Domain Generalization

نویسندگان

چکیده

Reducing the representational discrepancy between source and target domains is a key component to maximize model generalization. In this work, we advocate for leveraging natural language supervision domain generalization task. We introduce two modules ground visual representations with texts containing typical reasoning of humans: (1) Visual Textual Joint Embedder (2) Explanation Generator. The former learns image-text joint embedding space where can high-level class-discriminative information into model. latter leverages an explainable generates explanations justifying rationale behind its decision. To best our knowledge, first work leverage vision-and-language cross-modality approach Our experiments newly created CUB-DG benchmark dataset demonstrate that be successfully used domain-invariant improve Furthermore, in large-scale DomainBed benchmark, proposed method achieves state-of-the-art results ranks 1st average performance five multi-domain datasets. codes are available at https://github.com/mswzeus/GVRT.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Generalization with Domain Generalization Graphs

This paper addresses the problem of using domain generalization graphs to generalize temporal data extracted from relational databases. A domain generalization graph associated with an attribute deenes a partial order which represents a set of generalization relations for the attribute. We propose formal speciications for domain generalization graphs associated with calendar (date and time) att...

متن کامل

Grounding Abstractions in Predictive State Representations

This paper proposes a systematic approach of representing abstract features in terms of low-level, subjective state representations. We demonstrate that a mapping between the agent’s predictive state representation and abstract features can be derived automatically from high-level training data supplied by the designer. Our empirical evaluation demonstrates that an experience-oriented state rep...

متن کامل

Towards Grounding Conceptual Spaces in Neural Representations

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. It aims at bridging the gap between symbolic and subsymbolic processing. Instances are represented by points in a high-dimensional space and concepts are represented by convex regions in this space. In this paper, we present our approach towards grounding the dimensions of a conceptual spac...

متن کامل

Generalization Bounds for Domain Adaptation

In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. We consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target data. In particular, we use the integral probability metric to measure the difference between tw...

متن کامل

Active Grounding of Visual Situations

We address a key problem for computer vision: retrieving images that are instances of visual situations. Visual situations are concepts such as “a boxing match”, “a birthday party”, “walking the dog”, “a crowd waiting for a bus,” “a handshake”, or “a game of ping-pong,” whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19836-6_3